KrigR - Climate Data For Your Spatial Analyses
An Introduction To The KrigR Package
Setting Things Up
Installing KrigR
First of all, we need to install KrigR. Since we are currently preparing the package for submission to CRAN, it is only available through the associated GitHub repository (https://github.com/ErikKusch/KrigR) for the time being. There, you may also find presentations as well as this guide/workshop.
Here, we use the devtools package within R to get access to the install_github() function. For this to run, we need to tell R to not stop the installation from GitHub as soon as a warning is produced. This would stop the installation process as early as one of the KrigR dependencies hits a warning as benign as “Package XYZ was built under R Version X.X.X” which can usually be ignored safely.
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS = "true")
devtools::install_github("https://github.com/ErikKusch/KrigR")
library(KrigR)For the sake of this tutorial, we need some extra packages for visualisations. We check if they are already installed, subsequently install (if necessary) and load them with this user-defined function:
install.load.package <- function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, repos = "http://cran.us.r-project.org")
}
require(x, character.only = TRUE)
}
package_vec <- c(
"tidyr", # for turning rasters into ggplot-dataframes
"ggplot2", # for plotting
"viridis", # colour palettes
"cowplot", # gridding multiple plots
"ggmap", # obtaining google maps
"gimms" # to get some pre-existing data to match in our downscaling
)
sapply(package_vec, install.load.package)## tidyr ggplot2 viridis cowplot ggmap gimms
## TRUE TRUE TRUE TRUE TRUE TRUE
Before we can proceed, you need to open up an account at the CDS and create an API key by following this link: https://cds.climate.copernicus.eu/api-how-to. This is required so that you may issue download requests through the KrigR package. Once you have done so, please register the user ID and API Key as characters as seen below:
Setting Up Directories
The tutorial is designed to run completely from scratch. For this to work in a structured way, we create a folder/directory structure so that we got some nice compartments on our hard drives for where each separate Kriging process is run. We create the following directories:
- A data directory for all of our individual Kriging processes
- A shapefile directory (located within our data directory) for all of the shapefiles that we will use
Downloading Shapefiles
Here, we download some of the most commonly used shapefiles (for our analyses using KrigR so far, at least). For repeat code-sourcing, we have implemented checks which only trigger the download of a given shapefile set if the data in question is not present in our Shapes directory yet.
This tutorial doesn’t make use of all the shapefiles we download here. We simply thought it prudent to show you what we have found to be useful and how to get your hands on the data in a reproducible way.
Land Cover
As we will see in this tutorial, masking out unnecessary areas from our analyses speeds up Kriging tremendously. Often, this will entail limiting data sets to terrestrial habitats. This land mask here does a terrific job at masking out non-land areas.
#### LAND MASK (for masking covariates according to land vs. sea)
if (!file.exists(file.path(Dir.Shapes, "LandMask.zip"))) { # if not downloaded yet
download.file("https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_land.zip",
destfile = file.path(Dir.Shapes, "LandMask.zip")
) # download cultural vector
unzip(file.path(Dir.Shapes, "LandMask.zip"), exdir = Dir.Shapes) # unzip data
}
LandMask <- readOGR(Dir.Shapes, "ne_10m_land", verbose = FALSE) # read
ggplot() +
geom_polygon(data = LandMask, aes(x = long, y = lat, group = group), colour = "darkred", fill = "black") +
theme_bw() +
labs(x = "Longitude", y = "Latitude")States & Provinces
Political divisions are often what policy makers are after. Hence, we also download a shapefile for states and provinces across the Globe.
#### STATE MASK (for within-nation borders)
if (!file.exists(file.path(Dir.Shapes, "StateMask.zip"))) { # if not downloaded yet
download.file("https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_1_states_provinces.zip",
destfile = file.path(Dir.Shapes, "StateMask.zip")
) # download cultural vector
unzip(file.path(Dir.Shapes, "StateMask.zip"), exdir = Dir.Shapes) # unzip data
}
StateMask <- readOGR(Dir.Shapes, "ne_10m_admin_1_states_provinces", verbose = FALSE) # read
ggplot() +
geom_polygon(data = StateMask, aes(x = long, y = lat, group = group), colour = "darkred", fill = "black") +
theme_bw() +
labs(x = "Longitude", y = "Latitude")As you can see, this shapefile provides a set of rather small (geographically speaking) regions. We will use these for showing how Kriging works within this tutorial because Kriging runs in a timely manner for these regions.
Additionally, it is worth pointing out that www.naturalearthdata.com offers a large host of further shapefile data sets including (but not limited to):
- National borders
- Rivers and Lakes
- Urban areas
- Reefs and Bathymetry
We strongly recommend looking there early on in your projects for any shapefile needs you may have.
Ecoregions
When we’re not bound by political boundaries, ecoregions can often help to limit the spatial scope of our macroecological studies to manageable sizes (as far as Kriging effort is concerned).
#### ECOREGIONS (for ecoregional borders)
if (!file.exists(file.path(Dir.Shapes, "WWF_ecoregions"))) { # if not downloaded yet
download.file("http://assets.worldwildlife.org/publications/15/files/original/official_teow.zip",
destfile = file.path(Dir.Shapes, "wwf_ecoregions.zip")
) # download regions
unzip(file.path(Dir.Shapes, "wwf_ecoregions.zip"), exdir = file.path(Dir.Shapes, "WWF_ecoregions")) # unzip data
}
EcoregionMask <- readOGR(file.path(Dir.Shapes, "WWF_ecoregions", "official", "wwf_terr_ecos.shp"), verbose = FALSE) # read
ggplot() +
geom_polygon(data = EcoregionMask, aes(x = long, y = lat, group = group), colour = "darkred", fill = "black") +
theme_bw() +
labs(x = "Longitude", y = "Latitude")This particular data set offers the shapes of biomes and biogeographical realms across the Earth. As far as Kriging with the KrigR package is concerned, we heavily advise against Kriging using biogeographical realms without further consideration since these large regions lead to extreme processing requirements.
Plotting Functions
In order to easily visualise our Kriging procedure including (1) inputs, (2) covariates, and (3) outputs without repeating too much of the same code, we create a few plotting functions of our own here.
Don’t worry about understanding how these work off the bat here. Kriging and the package KrigR are what we want to demonstrate here - not visualisation strategies.
Raw Data
This function plots the raw data that we will krig and exports a single plot of all layers of the input raster:
Plot_Raw <- function(Raw, Shp = NULL, Dates, Legend = "Air Temperature [K]") {
Raw_df <- as.data.frame(Raw, xy = TRUE) # turn raster into dataframe
colnames(Raw_df)[c(-1, -2)] <- Dates # set colnames
Raw_df <- gather(data = Raw_df, key = Values, value = "value", colnames(Raw_df)[c(-1, -2)]) # make ggplot-ready
Raw_plot <- ggplot() + # create a plot
geom_raster(data = Raw_df, aes(x = x, y = y, fill = value)) + # plot the raw data
facet_wrap(~Values) + # split raster layers up
theme_bw() +
labs(x = "Longitude", y = "Latitude") + # make plot more readable
scale_fill_gradientn(name = Legend, colours = inferno(100)) # add colour and legend
if (!is.null(Shp)) { # if a shape has been designated
Raw_plot <- Raw_plot + geom_polygon(data = Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
}
return(Raw_plot)
} # export the plotCovariates
This function plots the covariate data we will be using by showing each variable on both resolution levels side-by-side:
Plot_Covs <- function(Covs, Shp = NULL) {
Plots_ls <- as.list(rep(NA, nlayers(Covs[[1]]) * 2)) # create as many plots as there are covariates variables * 2
Counter <- 1 # counter for list position
for (Variable in 1:nlayers(Covs[[1]])) { # loop over all covariate variables
Covs_Iter <- list(Covs[[1]][[Variable]], Covs[[2]][[Variable]]) # extract the data for this variable
for (Plot in 1:2) { # loop over both resolutions for the current variable
Cov_df <- as.data.frame(Covs_Iter[[Plot]], xy = TRUE) # turn raster into data frame
colnames(Cov_df)[3] <- "Values" # assign a column name to the data column
Plots_ls[[Counter]] <- ggplot() + # create plot
geom_raster(data = Cov_df, aes(x = x, y = y, fill = Values)) + # plot the covariate data
theme_bw() +
labs(x = "Longitude", y = "Latitude") + # make plot more readable
scale_fill_gradientn(name = names(Covs[[Plot]]), colours = cividis(100)) + # add colour and legend
theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) # reduce margins (for fusing of plots)
if (!is.null(Shp)) { # if a shape has been designated
Plots_ls[[Counter]] <- Plots_ls[[Counter]] + geom_polygon(data = Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
}
Counter <- Counter + 1 # raise list counter
} # end of resolution loop
} # end of variable loop
ggPlot <- plot_grid(plotlist = Plots_ls, ncol = 2, labels = "AUTO") # fuse the plots into one big plot
return(ggPlot)
} # export the plotKriged Products
This function plots the Kriging outputs by splitting them up according to whether they are Kriging predictions or the uncertainties associated with them:
Plot_Krigs <- function(Krigs, Shp = NULL, Dates, Legend = "Air Temperature [K]") {
Type_vec <- c("Prediction", "Standard Error") # these are the output types of krigR
Colours_ls <- list(inferno(100), rev(viridis(100))) # we want separate colours for the types
Plots_ls <- as.list(NA, NA) # this list will be filled with the output plots
for (Plot in 1:2) { # loop over both output types
Krig_df <- as.data.frame(Krigs[[Plot]], xy = TRUE) # turn raster into dataframe
colnames(Krig_df)[c(-1, -2)] <- paste(Type_vec[Plot], Dates) # set colnames
Krig_df <- gather(data = Krig_df, key = Values, value = "value", colnames(Krig_df)[c(-1, -2)]) # make ggplot-ready
Plots_ls[[Plot]] <- ggplot() + # create plot
geom_raster(data = Krig_df, aes(x = x, y = y, fill = value)) + # plot the kriged data
facet_wrap(~Values) + # split raster layers up
theme_bw() +
labs(x = "Longitude", y = "Latitude") + # make plot more readable
scale_fill_gradientn(name = Legend, colours = Colours_ls[[Plot]]) + # add colour and legend
theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) # reduce margins (for fusing of plots)
if (!is.null(Shp)) { # if a shape has been designated
Plots_ls[[Plot]] <- Plots_ls[[Plot]] + geom_polygon(data = Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
}
} # end of type-loop
ggPlot <- plot_grid(plotlist = Plots_ls, ncol = 1, labels = "AUTO") # fuse the plots into one big plot
return(ggPlot)
} # export the plotKriging - The Three Steps
Using KrigR in this way has us use the functions download_ERA(), download_DEM(), and krigR(). Running these functions by themselves as opposed to executing the KrigR pipeline (which we cover later in this tutorial) gives you the most control and oversight of the KrigR workflow.
We will start with a simple Kriging process and subsequently make it more sophisticated during this tutorial.
The most simple way in which you can run the functions of the KrigR package is by specifying a rectangular bounding box (i.e. an extent) to specify your study region(s). Here, we will run a small downscaling exercise of my birth-state of Saxony. It is a medium-sized state in the east of Germany and lends itself to some nice statistical downscaling since it presents us with mountainous regions along the south-eastern border and flatland areas towards the north-western edges.
Here’s the full area we will be obtaining and downscaling data for:
Extent <- extent(c(11.8, 15.1, 50.1, 51.7)) # roughly the extent of Saxony
GoogMap <- get_map(c(
left = Extent[1],
right = Extent[2],
bottom = Extent[3],
top = Extent[4]
))
ggmap(GoogMap) +
theme_bw() + labs(x = "Longitude", y = "Latitude")Here, you can see a map of the region with the mountains of the Erzgebirge along the southern border to the Czech Republic as well as the flat terrains around Leipzig in the north of Saxony.
Now, let’s create a separate directory within which we will store the raw ERA5-land data, GMTED2010 DEM covariate data, and Kriging outputs:
Climate Data
No we are ready to carry out our first download of climate reanalysis data through the KrigR package!
For this part of the tutorial, we download air temperature for a three-day interval around my birthday (03-01-1995) using the extent of my home-state of Saxony.
Notice that the downloading of ERA-family reanalysis data may take a short while to start as the download request gets queued with the CDS of the ECMWF before it is executed.
State_Raw <- download_ERA(
Variable = "2m_temperature",
DataSet = "era5-land",
DateStart = "1995-01-02",
DateStop = "1995-01-04",
TResolution = "day",
TStep = 1,
Extent = Extent,
Dir = Dir.StateExt,
API_User = API_User,
API_Key = API_Key
)
Plot_Raw(State_Raw, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))Now, let’s look at the raster that was produced:
## class : RasterBrick
## dimensions : 19, 36, 684, 3 (nrow, ncol, ncell, nlayers)
## resolution : 0.1, 0.09999996 (x, y)
## extent : 11.65, 15.25, 49.95, 51.85 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : index_1, index_2, index_3
## min values : 267.8741, 266.8182, 262.5400
## max values : 273.2644, 272.0682, 269.5377
As you can see, we obtained a RasterBrick object with 3 layers of data (one for each day we are interested in). Notice that extent of our downloaded data set does not fit the extent we set earlier manually. This is a precaution we have taken within KrigR to make sure that all data cells you are interested in are covered.
KrigR widens the spatial extent that is specified to ensure full coverage of the respective ERA5(-Land) raster cells. Global downloads are not affected by this and work just as you’d expect.
Additionally, as could be gleamed from the plots above, it is apparent that my home-state got a lot cooler on the day after my birth (04/01/1995) when compared to the two preceding days. What to make of this, I leave up to you because I don’t think I like the interpretation myself if we were to assume causality here.
Keep in mind that every function within the KrigR package produces NetCDF (.nc) files in the specified directory (Dir argument in the function call) to allow for further manipulation outside of R if necessary (for example, using Panoply).
Covariates
Next, we use the download_DEM() function which comes with KrigR to obtain elevation data as our covariate of choice. This produces two rasters:
- A raster of training resolution which matches the input data in all attributes except for the data in each cell
- A raster of target resolution which matches the input data as closely as possible in all attributes except for the resolution (which is specified by the user) and the data in each cell
Both of these products are bundled into a list where the first element corresponds to the training resolution and the second element contains the target resolution covariate data. Here, we specify a target resolution of .02.
Covs_ls <- download_DEM(
Train_ras = State_Raw,
Target_res = .02,
Dir = Dir.StateExt,
Keep_Temporary = TRUE
)
Plot_Covs(Covs_ls)Alternatively to specifying a target resolution, you can specify a different raster which should be matched in all attributes by the raster at target resolution. We get to this again at a later point in this tutorial.
For now, let’s simply inspect our list of covariate rasters:
## [[1]]
## class : RasterLayer
## dimensions : 19, 36, 684 (nrow, ncol, ncell)
## resolution : 0.1, 0.09999996 (x, y)
## extent : 11.65, 15.25, 49.95, 51.85 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : DEM
## values : 59.11391, 926.513 (min, max)
##
##
## [[2]]
## class : RasterLayer
## dimensions : 114, 216, 24624 (nrow, ncol, ncell)
## resolution : 0.01666667, 0.01666667 (x, y)
## extent : 11.64986, 15.24986, 49.94986, 51.84986 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : DEM
## values : 46.5, 1150.25 (min, max)
This set of covariates has 684 and 24624 cells containing values at training and target resolution, respectively.
Kriging
Now let’s krig these data:
KrigStart <- Sys.time()
State_Krig <- krigR(
Data = State_Raw, # data we want to krig as a raster object
Covariates_coarse = Covs_ls[[1]], # training covariate as a raster object
Covariates_fine = Covs_ls[[2]], # target covariate as a raster object
Keep_Temporary = FALSE, # we don't want to retain the individually kriged layers on our hard-drive
Cores = 1, # we want to krig on just one core
FileName = "State_Ext.nc", # the file name for our full kriging output
Dir = Dir.StateExt # which directory to save our final input in
)
KrigStop <- Sys.time()
Plot_Krigs(State_Krig, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))There we go. All the data has been downscaled and we do have uncertainties recorded for all of our outputs. As you can see, the elevation patterns show up clearly in our kriged air temperature output. Furthermore, you can see that our certainty of Kriging predictions drops on the 04/01/1995 in comparison to the two preceding days. However, do keep in mind that a maximum standard error of 0.195, 0.206, 0.333 (for each layer of our output respectively) on a total range of data of 6.553, 6.318, 7.788 (again, for each layer in the output respectively) is evident of a downscaling result we can be confident in.
Now, what does the output actually look like?
State_Krig[-3] # we will talk later about why we leave out the third list element produced by krigR here## $Kriging_Output
## class : RasterBrick
## dimensions : 114, 216, 24624, 3 (nrow, ncol, ncell, nlayers)
## resolution : 0.01666667, 0.01666667 (x, y)
## extent : 11.64986, 15.24986, 49.94986, 51.84986 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : var1.pred.1, var1.pred.2, var1.pred.3
## min values : 266.8146, 265.6941, 261.6227
## max values : 273.3675, 272.0124, 269.4109
##
##
## $Kriging_SE
## class : RasterBrick
## dimensions : 114, 216, 24624, 3 (nrow, ncol, ncell, nlayers)
## resolution : 0.01666667, 0.01666667 (x, y)
## extent : 11.64986, 15.24986, 49.94986, 51.84986 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : var1.stdev.1, var1.stdev.2, var1.stdev.3
## min values : 0.1624252, 0.1706871, 0.2581532
## max values : 0.1951452, 0.2059939, 0.3330932
As output of the krigR() function, we obtain a list of downscaled data as the first element and downscaling standard errors as the second list element.
Finally, let’s see how long it took to carry out this shapefile-informed downscaling:
## Time difference of 1.398976 mins
Kriging - the Pipeline
Now that we’ve seen how you can execute the KrigR workflow using three separate functions, it is time that we show you the most simplified function call to obtain some downscaled products: the pipeline.
First, let’s create a directory which the KrigR pipeline will leak .nc files to. Don’t worry, this leak in the pipeline is very much intentional and of no environmental concern (unless you are running low on disk space).
We have coded the krigR() function in such a way that it can either be addressed at already present spatial products within your R environment, or handle all the downloading and resampling of input data and covariates for you from scratch. To run the exact same Kriging approach as within our extent-example, we can specify the krigR() function as such:
Pipe_Krig <- krigR(
Variable = "2m_temperature",
Type = "reanalysis",
DataSet = "era5-land",
DateStart = "1995-01-02",
DateStop = "1995-01-04",
TResolution = "day",
TStep = 1,
Extent = Extent,
API_User = API_User,
API_Key = API_Key,
Target_res = .02,
Cores = 1,
FileName = "State_Pipe.nc",
Dir = Dir.StatePipe
)
Pipe_Krig[-3] # we will talk later about why we leave out the third list element produced by krigR here
Plot_Krigs(Pipe_Krig, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))Now, let’s check how this differs from the extent-informed output on three separate steps:
## [1] FALSE
Surprise! There is no difference.
Getting More out of KrigR
Kriging By Shapefile
The computational cost of Kriging depends on many factors, one of which is number of observation (i.e. raster cells), which we can limit by using a shapefile.
Let’s revisit our above Saxony-Kriging example, but this time, we will use a shapefile to mask for the exact boundaries of the state itself! We extract the shapefile from the states & provinces data set as follows:
Position <- which(StateMask$name_en == "Saxony") # find Saxony
State_Shp <- StateMask[Position,]
GoogMap <- get_map(c(left = extent(State_Shp)[1],
right = extent(State_Shp)[2],
bottom = extent(State_Shp)[3],
top = extent(State_Shp)[4]))
ggmap(GoogMap) +
geom_polygon(data = State_Shp, aes(x = long, y = lat, group = group),
colour = "black", fill = NA, size = 2) +
theme_bw() + labs(x = "Longitude", y = "Latitude") Here, you can see a map of the region with an overlay of the state-borders of Saxony as defined by our shapefile.
Again, we create a directory for all the .nc files that our functions create:
Climate Data
Again, we download air temperature data from the ERA5-Land data set for the three days from the second to the fourth of January 1995. This time, however, we limit the data by shapefile.
State_Raw <- download_ERA(
Variable = "2m_temperature",
DataSet = "era5-land",
DateStart = "1995-01-02",
DateStop = "1995-01-04",
TResolution = "day",
TStep = 1,
Extent = State_Shp,
Dir = Dir.StateShp,
API_User = API_User,
API_Key = API_Key
)
Plot_Raw(State_Raw, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))
Note that we have implemented masking by shapefile in such a way that every cell that is even partially covered by your shapefile is retained in the downloaded ERA5(-Land) data set. Regular masking via the raster package-contained mask() function only retains those cells whose centroids fall within the shapefile boundaries.
Covariates
Again, we obtain elevation data as our covariate of choice:
Covs_ls <- download_DEM(
Train_ras = State_Raw,
Target_res = .02,
Shape = State_Shp,
Dir = Dir.StateShp,
Keep_Temporary = TRUE
)
Plot_Covs(Covs_ls, State_Shp) Notice that despite the covariate rasters (and input rasters, for that matter) containing 578 and 2.080810^{4} for training and target resolution respectively, we only obtain data for 290 and 8723 cells respectively due to our specification of a shapefile.
Kriging
Again, we execute our Kriging functionality using the krigR() function:
KrigStart <- Sys.time() # record when we started
State_Krig <- krigR(
Data = State_Raw, # what to krig
Covariates_coarse = Covs_ls[[1]], # covariates at training resolution
Covariates_fine = Covs_ls[[2]], # covariates at target resolution
Keep_Temporary = FALSE, # delete temporary krigs of layers
Cores = 1, # only run this on 1 core
FileName = "State_Shape.nc", # save the finished file as this
Dir = Dir.StateShp # where to save the output
)
KrigStop <- Sys.time() # record when the function was done
Plot_Krigs(State_Krig, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))The results look just like the ones that we obtained without using the shapefile (this time, we only see what’s contained in the shapefile, though). It is noteworthy to highlight the change in uncertainties here. Overall, our uncertainty decreased in going from extent-informed to shapefile-informed kriging in this example. However, we now see some serious chessboard pattern of the uncertainties in our outputs. This is due to the uncertainty decreasing as fine-resolution cells are more closely located towards the centre of a coarse-scaled cell which they have been downscaled from. This behaviour is normal. To obtain more useful kriging standard errors (i.e. uncertainties), we use local Kriging (we get to later in this tutorial).
Let’s see how long this took:
## Time difference of 3.669087 secs
Running this downscaling process for Saxony using the shapefile took 2.6226939 of the original downscaling time using an extent statement.
Using a shapefile rather than an extent statement speeds up the Kriging process!
Multi-Core Kriging
The computational cost of Kriging can be divided up amongst multiple cores using our krigR() function and the Cores argument therein. This enables parallel computation of multi-layer rasters.
Let’s stick with our shapefile-informed downscaling of air temperature records across Saxony. This time, however, we make use of the Cores argument in the krigR() function to run the downscaling of our three time steps worth of data in parallel:
KrigStart <- Sys.time() # record when we started
State_Krig <- krigR(
Data = State_Raw, # what to krig
Covariates_coarse = Covs_ls[[1]], # covariates at training resolution
Covariates_fine = Covs_ls[[2]], # covariates at target resolution
Keep_Temporary = FALSE, # delete temporary krigs of layers
Cores = 3, # run this on 3 core
FileName = "State_Cores.nc", # save the finished file as this
Dir = Dir.StateShp # where to save the output
)
KrigStop <- Sys.time() # record when the function was done
Plot_Krigs(State_Krig, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))The results look just like the ones that we obtained without using the shapefile and just one-core processing. Let’s see how long this took:
## Time difference of 6.544497 secs
Running this downscaling process for Saxony using the shapefile took 1.7836855 of the original downscaling time using an extent statement (the effects multi-core Kriging really kick in when looking at rasters with many bands because the initial opening of clusters for parallel processing takes some time by itself.).
Using the Cores argument with a value of n (with n > 1) speeds up the Kriging process of multi-layer rasters by an order of n!
Weighted Local Kriging
By default Kriging of the krigR() function uses all cells in a spatial product to downscale individual cells of rasters. The nmax argument can circumvent this.
Let’s build further on our above example by adding the nmax argument (passed on to gstat::krige()) to our krigR() function call. This argument controls how many of the closest cells the Kriging algorithm should consider in the downscaling of individual coarse, training cells.
State_Krig <- krigR(
Data = State_Raw, # what to krig
Covariates_coarse = Covs_ls[[1]], # covariates at training resolution
Covariates_fine = Covs_ls[[2]], # covariates at target resolution
Keep_Temporary = FALSE, # delete temporary krigs of layers
Cores = 3, # only run this on 1 core
FileName = "State_Local.nc", # save the finished file as this
Dir = Dir.StateShp, # where to save the output
nmax = 15 # how many nearest neighbours to consider in Kriging
)
Plot_Krigs(State_Krig, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"))The air temperature prediction/downscaling results look just like the ones that we obtained above. However, we seriously improved our localised understanding of Kriging uncertainties (i.e. we see much more localised patterns of Kriging standard error). In this case for Saxony, our uncertainties seem to be highest for areas where the landscape is dominated by large, abrupt changes in elevation (e.g. around the Fichtelberg along the southern border) and water-dominated areas such as streams and lakes (e.g. the lakes around Leipzig in the North of Saxony).
Using the nmax argument helps to identify highly localised patterns in the Kriging uncertainty as well as predictions!
We are working on an automatic feature for the identification of an appropriate number of nmax straight from the data itself.
Aggregation of Temporal Resolutions
The KrigR package is capable of some neat pre-processing of raw climate data. - both spatially and temporally.
The functions used for downloading reanalysis data with the KrigR package (download_ERA() and download_UERRA()) accept the following arguments which you can use to control the temporal resolution of your climate data:
- TResolution controls the time-line that TStep indexes. You can specify anything from the following: 'hour', 'day', 'month', or 'year'. The default is 'month'.
- TStep controls how many time-steps to aggregate into one layer of data each. Aggregation is done via taking the mean per cell in each raster comprising time steps that go into the final, aggregated time-step. The default is 1.
Let’s run through a few examples to make clear how desired temporal resolution of data can be achieved using the KrigR package:
| What we want | TResolution |
TStep |
|---|---|---|
| Hourly intervals | 'hour' |
1 / don’t specify |
| 6-hour intervals | 'hour' |
6 |
| Half-day intervals | 'hour' |
12 |
| Daily intervals | 'day' |
1 / don’t specify |
| 3-day intervals | 'day' |
3 |
| Weekly intervals | 'day' |
7 |
| Monthly aggregates | 'month' / don’t specify |
1 / don’t specify |
| 4-month intervals | 'month' / don’t specify |
4 |
| Annual intervals | 'year' |
1 / don’t specify |
| 10-year intervals | 'year' |
10 |
Specifying TResolution of 'month' will result in the download of full month aggregates for every month included in your time series.
For example, DateStart = "2000-01-20", DateStop = "2000-02-20" with TResolution = 'month', and TStep = 1 does not result in the mean aggregate for the month between the 20/01/200 and the 20/02/2000, but does result in the monthly aggregates for January and February 2000. If you desire the former, you would need to specify DateStart = "2000-01-20", DateStop = "2000-02-20" with TResolution = 'day', and TStep = 32 (the number of days between the two dates).
The reanalysis download functions of the KrigR package automatically break down download requests into monthly intervals thus circumventing the danger of running into making a download request that is too big for the CDS.
For example, DateStart = "2000-01-20", DateStop = "2000-02-20" with TResolution = 'day', and TStep = 8 will lead to two download requests to the CDS: (1) hourly data in the range 20/01/2000 00:00 to 31/01/2000 23:00, and (2) hourly data in the range 01/02/2000 00:00 to 20/02/2000 23:00. These data sets are subsequently fused in R, aggregated to daily aggregates, and finally, aggregated to four big aggregates.
This gives you a lot of flexibility, but always keep in mind that third-party data sets might not account for leap-years so make sure the dates of third-party data (should you chose to use some) lines up with the ones as specified by your calls to the functions of the KrigR package.
Use the TResolution, and TStep arguments in the download_ERA(), and download_UERRA() functions of the KrigR package to adjust the temporal resolution of your climate data when downloading reanalysis data with our package.
Kriging to Match Already Available Data
We expect that you won’t want to downscale to specific resolutions most of the time, but rather, match an already existing spatial data set in terms of spatial resolution and extent. Again, the KrigR package got you covered!
Usually, you probably want to krig to match a certain pre-existing data set rather than a certain resolution.
Here, we illustrate this with an NDVI-based example. The NDVI is a satellite-derived vegetation index which tells us how green the Earth is. It comes in bi-weekly intervals and at a spatial resolution of .08333 (roughly 9km). Here, we download all NDVI data for the year 2000 and then create the annual mean. This time, we do so for the state of California because of its size and topographical variety.
Position <- which(StateMask$name_en == "California") # find our state
State_Shp <- StateMask[Position, ] # extract shapefile
if (!file.exists(file.path(Dir.Data, "ReferenceRaster.nc"))) { # if reference raster doesn't already exist
gimms_files <- downloadGimms(
x = as.Date("2000-01-01"), # download from January 2000
y = as.Date("2000-12-31"), # download to December 2000
dsn = Dir.Data, # save downloads in data folder
quiet = FALSE
) # show download progress
Reference_ras <- rasterizeGimms(x = gimms_files, remove_header = TRUE) # rasterize the data
Negatives <- which(values(Reference_ras) < 0) # identify all negative values
values(Reference_ras)[Negatives] <- 0 # set threshold for barren land (NDVI<0)
Reference_ras <- calc(Reference_ras, fun = mean, na.rm = TRUE) # annual mean
Reference_ras <- crop(Reference_ras, State_Shp) # crop to fit shape
Reference_ras <- mask(Reference_ras, State_Shp) # mask by shape
writeRaster(x = Reference_ras, filename = file.path(Dir.Data, "ReferenceRaster"), format = "CDF") # write raster to data directory
unlink(gimms_files, recursive = TRUE) # delete raw GIMMs data
rm(list = c("Negatives", "gimms_files")) # delete from environment
} else { # if reference raster already exists
Reference_ras <- raster(file.path(Dir.Data, "ReferenceRaster.nc")) # GIMMs NDVI data as baseline raster
}So what does this raster look like?
## class : RasterLayer
## dimensions : 114, 124, 14136 (nrow, ncol, ncell)
## resolution : 0.08333333, 0.08333333 (x, y)
## extent : -124.4167, -114.0833, 32.5, 42 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : C:/Users/au656028/Documents/Work/Active Projects/[R Package] KrigR/KrigR/Workshop/Data/ReferenceRaster.nc
## names : layer
## values : 0.007266667, 0.8722875 (min, max)
## zvar : layer
As stated above, we want to match this with our output.
So, let’s create our directory for this as we did before:
We could do this whole analysis in our three steps as outlined above, but why bother when the pipeline described above gets the job done just as well?
Matching Kriging outputs with a pre-existing data set is as easy as plugging the pre-existing raster into the Target_res argument of the krigR() or the download_DEM() function.
This time we want to downscale from ERA5 resolution (roughly 30km) because the ERA5-Land data already matches the NDVI resolution (roughly 9km). Here’s how we do this:
NDVI_Krig <- krigR(
Variable = "2m_temperature",
Type = "reanalysis",
DataSet = "era5",
DateStart = "2000-01-01",
DateStop = "2000-12-31",
TResolution = "year",
TStep = 1,
Extent = State_Shp,
API_User = API_User,
API_Key = API_Key,
Target_res = Reference_ras,
Cores = 1,
FileName = "AirTemp_NDVI.nc",
nmax = 80, # we are still working on an automatic optimisation of this. Here, let's try a bigger number
Dir = Dir.NDVI
)So? Did we match the pre-existing data?
## class : RasterBrick
## dimensions : 114, 124, 14136, 1 (nrow, ncol, ncell, nlayers)
## resolution : 0.08333333, 0.08333333 (x, y)
## extent : -124.4167, -114.0833, 32.5, 42 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## source : memory
## names : var1.pred
## min values : 271.4913
## max values : 299.1506
We nailed this!
Let’s take one final look at our (A) raw ERA5 data, (B) NDVI data, (C) Kriged ERA5 data, and (D) standard error of our Kriging output:
### ERA-Plot
ERA_df <- as.data.frame(raster(file.path(Dir.NDVI, "2m_temperature_2000-01-01_2000-12-31_year.nc")), xy = TRUE) # turn raster into dataframe
colnames(ERA_df)[c(-1, -2)] <- "Air Temperature 2000 (ERA5)"
ERA_df <- gather(data = ERA_df, key = Values, value = "value", colnames(ERA_df)[c(-1, -2)]) # make ggplot-ready
Raw_plot <- ggplot() + # create a plot
geom_raster(data = ERA_df, aes(x = x, y = y, fill = value)) + # plot the raw data
facet_wrap(~Values) + # split raster layers up
theme_bw() +
labs(x = "Longitude", y = "Latitude") + # make plot more readable
scale_fill_gradientn(name = "Air Temperature [K]", colours = inferno(100)) + # add colour and legend
geom_polygon(data = State_Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
### NDVI-Plot
NDVI_df <- as.data.frame(Reference_ras, xy = TRUE) # turn raster into dataframe
colnames(NDVI_df)[c(-1, -2)] <- "NDVI 2000"
NDVI_df <- gather(data = NDVI_df, key = Values, value = "value", colnames(NDVI_df)[c(-1, -2)]) # make ggplot-ready
NDVI_plot <- ggplot() + # create a plot
geom_raster(data = NDVI_df, aes(x = x, y = y, fill = value)) + # plot the raw data
facet_wrap(~Values) + # split raster layers up
theme_bw() +
labs(x = "Longitude", y = "Latitude") + # make plot more readable
scale_fill_gradientn(name = "NDVI", colours = rev(terrain.colors(100))) + # add colour and legend
geom_polygon(data = State_Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
### KRIGED-Plots
Dates <- c("Kriged Air Temperature 2000 (NDVI Resolution)")
Type_vec <- c("Prediction", "Standard Error") # these are the output types of krigR
Colours_ls <- list(inferno(100), rev(viridis(100))) # we want separate colours for the types
Plots_ls <- as.list(NA, NA) # this list will be filled with the output plots
for (Plot in 1:2) { # loop over both output types
Krig_df <- as.data.frame(NDVI_Krig[[Plot]], xy = TRUE) # turn raster into dataframe
colnames(Krig_df)[c(-1, -2)] <- paste(Type_vec[Plot], Dates) # set colnames
Krig_df <- gather(data = Krig_df, key = Values, value = "value", colnames(Krig_df)[c(-1, -2)]) # make ggplot-ready
Plots_ls[[Plot]] <- ggplot() + # create plot
geom_raster(data = Krig_df, aes(x = x, y = y, fill = value)) + # plot the kriged data
facet_wrap(~Values) + # split raster layers up
theme_bw() +
labs(x = "Longitude", y = "Latitude") + # make plot more readable
scale_fill_gradientn(name = "Air Temperature [K]", colours = Colours_ls[[Plot]]) + # add colour and legend
theme(plot.margin = unit(c(0, 0, 0, 0), "cm")) + # reduce margins (for fusing of plots)
geom_polygon(data = State_Shp, aes(x = long, y = lat, group = group), colour = "black", fill = "NA") # add shape
} # end of type-loop
### FINAL PLOT
plot_grid(plotlist = list(Raw_plot, NDVI_plot, Plots_ls[[1]], Plots_ls[[2]]), nrow = 2, labels = "AUTO")The Call List
So far, we have only ever looked at the first two elements in the list returned by krigR(). A quick look at the help file, the code, or this guide reveals that there is a third list element - the call list. When coding this feature into krigR() we intended for this to be a neat, clean, storage-friendly way of keeping track of how the spatial product was created. It does so without storing additional spatial products. Let’s have a look at it:
## $Data
## $Data$Class
## [1] "RasterBrick"
## attr(,"package")
## [1] "raster"
##
## $Data$Dimensions
## $Data$Dimensions$nrow
## [1] 20
##
## $Data$Dimensions$ncol
## [1] 23
##
## $Data$Dimensions$ncell
## [1] 460
##
##
## $Data$Extent
## class : Extent
## xmin : -125.16
## xmax : -113.659
## ymin : 32.282
## ymax : 42.282
##
## $Data$CRS
## CRS arguments: +proj=longlat +datum=WGS84 +no_defs
##
## $Data$layers
## [1] "index_1"
##
##
## $Covariates_coarse
## $Covariates_coarse$Class
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
##
## $Covariates_coarse$Dimensions
## $Covariates_coarse$Dimensions$nrow
## [1] 20
##
## $Covariates_coarse$Dimensions$ncol
## [1] 23
##
## $Covariates_coarse$Dimensions$ncell
## [1] 460
##
##
## $Covariates_coarse$Extent
## class : Extent
## xmin : -125.16
## xmax : -113.659
## ymin : 32.282
## ymax : 42.282
##
## $Covariates_coarse$CRS
## CRS arguments: +proj=longlat +datum=WGS84 +no_defs
##
## $Covariates_coarse$layers
## [1] "DEM"
##
##
## $Covariates_fine
## $Covariates_fine$Class
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"
##
## $Covariates_fine$Dimensions
## $Covariates_fine$Dimensions$nrow
## [1] 114
##
## $Covariates_fine$Dimensions$ncol
## [1] 124
##
## $Covariates_fine$Dimensions$ncell
## [1] 14136
##
##
## $Covariates_fine$Extent
## class : Extent
## xmin : -124.4167
## xmax : -114.0833
## ymin : 32.5
## ymax : 42
##
## $Covariates_fine$CRS
## CRS arguments: +proj=longlat +datum=WGS84 +no_defs
##
## $Covariates_fine$layers
## [1] "DEM"
##
##
## $KrigingEquation
## ERA ~ DEM
## <environment: 0x000000003940fef8>
##
## $Cores
## [1] 1
##
## $FileName
## [1] "AirTemp_NDVI.nc"
##
## $Keep_Temporary
## [1] TRUE
##
## $nmax
## [1] 80
##
## $Data_Retrieval
## $Data_Retrieval$Variable
## [1] "2m_temperature"
##
## $Data_Retrieval$Type
## [1] "reanalysis"
##
## $Data_Retrieval$DataSet
## [1] "era5"
##
## $Data_Retrieval$DateStart
## [1] "2000-01-01"
##
## $Data_Retrieval$DateStop
## [1] "2000-12-31"
##
## $Data_Retrieval$TResolution
## [1] "year"
##
## $Data_Retrieval$TStep
## [1] 1
##
## $Data_Retrieval$Extent
## class : SpatialPolygonsDataFrame
## features : 1
## extent : -124.4092, -114.1191, 32.53167, 41.99954 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +datum=WGS84 +no_defs
## variables : 83
## names : featurecla, scalerank, adm1_code, diss_me, iso_3166_2, wikipedia, iso_a2, adm0_sr, name, name_alt, name_local, type, type_en, code_local, code_hasc, ...
## value : Admin-1 scale rank, 2, USA-3521, 3521, US-CA, http://en.wikipedia.org/wiki/California, US, 8, California, CA|Calif., NA, State, State, US06, US.CA, ...
This lengthy list should contain all information you need to trace how you created a certain data set using krigR(). If you feel like anything is missing in this list, please contact us.
Using Third-Party Data
Our krigR() function is designed to work with non-ERA climate data as well as non-GMTED2010 covariate data. To krig your own spatial products using different covariate data than the GMTED2010 DEM we use as a default, you need to step into the three-step workflow.
ATTENTION: Kriging only works on square-cell spatial products!
krigR() supports any combination of ERA-family reanalysis, GMTED2010, third-party climate data, and third-party covariate data. Here, we just demonstrate the use of other covariates than the GMTED2010 used by KrigR by default.
For this example, let’s go back to Saxony:
Now, let’s download some covariate data:
- TKSat - thermal conductivity of saturated soil
- TKDry - thermal conductivity of dry soil
Dir.TP <- file.path(Dir.Data, "Third-Party_Data")
dir.create(Dir.TP)
#### Thermal Conductivity of saturated soil
if (!file.exists(file.path(Dir.TP, "TKSat.zip"))) { # if not downloaded yet
download.file("http://globalchange.bnu.edu.cn/download/data/worldptf/tksat.zip",
destfile = file.path(Dir.TP, "TKSat.zip")
) # download cultural vector
unzip(file.path(Dir.TP, "TKSat.zip"), exdir = Dir.TP) # unzip data
}
TKSat_ras <- raster(file.path(Dir.TP, "tksatu_l1.nc"))
#### Thermal Conductivity of dry soil
if (!file.exists(file.path(Dir.TP, "TKDry.zip"))) { # if not downloaded yet
download.file("http://globalchange.bnu.edu.cn/download/data/worldptf/tkdry.zip",
destfile = file.path(Dir.TP, "TKdry.zip")
) # download cultural vector
unzip(file.path(Dir.TP, "TKdry.zip"), exdir = Dir.TP) # unzip data
}
TKDry_ras <- raster(file.path(Dir.TP, "tkdry_l1.nc"))Climate Data
This time, let’s mix it up a bit and look at soil temperature in the uppermost depth layer that ERA5(-Land) recognizes (0-7cm):
State_Raw <- download_ERA(
Variable = "soil_temperature_level_1",
DataSet = "era5-land",
DateStart = "1995-01-02",
DateStop = "1995-01-04",
TResolution = "day",
TStep = 1,
Extent = State_Shp,
Dir = Dir.TP,
API_User = API_User,
API_Key = API_Key
)
Plot_Raw(State_Raw, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"), Legend = "Soil Temperature (0-7cm) [K]")Covariates
As far as covariate data is concerned, we downloaded and loaded into R global data sets which now limit to Saxony:
## TKSat
TKSat_ras <- crop(TKSat_ras, extent(State_Raw)) # crop to the right extent of our data we want to krig
Shape_ras <- rasterize(State_Shp, TKSat_ras, getCover = TRUE) # identify which cells are covered by the shape
Shape_ras[Shape_ras == 0] <- NA # set all cells which the shape doesn't touch to NA
TKSat_ras <- mask(TKSat_ras, Shape_ras) # mask for the shape of the state
## TKDry
TKDry_ras <- crop(TKDry_ras, extent(State_Raw)) # crop to the right extent of our data we want to krig
Shape_ras <- rasterize(State_Shp, TKDry_ras, getCover = TRUE) # identify which cells are covered by the shape
Shape_ras[Shape_ras == 0] <- NA # set all cells which the shape doesn't touch to NA
TKDry_ras <- mask(TKDry_ras, Shape_ras) # mask for the shape of the stateNow we combine these into a list containing to stacks of rasters and plot our third-party covariates:
# resample fine-resolution covariate data to match coarse input data
Coarsecovs <- stack(resample(x = TKSat_ras, y = State_Raw), resample(x = TKDry_ras, y = State_Raw))
names(Coarsecovs) <- c("TKWet", "TKDry") # setting some proer names
# High-resolution to which we downscale
Finecovs <- stack(TKSat_ras, TKDry_ras)
names(Finecovs) <- c("TKWet", "TKDry") # setting some proer names
# combine data into covariate list
Covs_ls <- list(Coarsecovs, Finecovs)
Plot_Covs(Covs_ls, State_Shp)Kriging
Finally, we can krig our soil temperature data using the thermal conductivity of the soil. For this, we need to specify a new KrigingEquation. If we don’t, krigR() will notify us that it our default formula cannot be executed and will attempt to build a formula from the data it can find (this auto-generated formula would be the same as the one we specify here).
State_Krig <- krigR(
Data = State_Raw, # what to krig
Covariates_coarse = Covs_ls[[1]], # covariates at training resolution
Covariates_fine = Covs_ls[[2]], # covariates at target resolution
Keep_Temporary = FALSE, # delete temporary krigs of layers
Cores = 3, # only run this on 1 core
KrigingEquation = "ERA ~ TKDry + TKWet", # this time, we need to specify a new formula. Try and see what happens if you forget to do this.
FileName = "State_TP.nc", # save the finished file as this
Dir = Dir.TP, # where to save the output
nmax = 15 # how many nearest neighbours to consider in Kriging
)
Plot_Krigs(State_Krig, Shp = State_Shp, Dates = c("02-01-1995", "03-01-1995", "04-01-1995"), Legend = "Soil Temperature (0-7cm) [K]")
You can use the KrigingEquation argument in krigR() to specify the use of third-party covariates and/or data to be kriged. You can specify additive as well as interaction effects.